1,154 research outputs found

    Fitting multiplicative models by robust alternating regressions.

    Get PDF
    In this paper a robust approach for fitting multiplicative models is presented. Focus is on the factor analysis model, where we will estimate factor loadings and scores by a robust alternating regression algorithm. The approach is highly robust, and also works well when there are more variables than observations. The technique yields a robust biplot, depicting the interaction structure between individuals and variables. This biplot is not predetermined by outliers, which can be retrieved from the residual plot. Also provided is an accompanying robust R-2-plot to determine the appropriate number of factors. The approach is illustrated by real and artificial examples and compared with factor analysis based on robust covariance matrix estimators. The same estimation technique can fit models with both additive and multiplicative effects (FANOVA models) to two-way tables, thereby extending the median polish technique.Alternating regression; Approximation; Biplot; Covariance; Dispersion matrices; Effects; Estimator; Exploratory data analysis; Factor analysis; Factors; FANOVA; Least-squares; Matrix; Median polish; Model; Models; Outliers; Principal components; Robustness; Structure; Two-way table; Variables; Yield;

    Robust Estimation with Discrete Explanatory Variables

    Get PDF

    Outlier Detection Using Nonconvex Penalized Regression

    Full text link
    This paper studies the outlier detection problem from the point of view of penalized regressions. Our regression model adds one mean shift parameter for each of the nn data points. We then apply a regularization favoring a sparse vector of mean shift parameters. The usual L1L_1 penalty yields a convex criterion, but we find that it fails to deliver a robust estimator. The L1L_1 penalty corresponds to soft thresholding. We introduce a thresholding (denoted by Θ\Theta) based iterative procedure for outlier detection (Θ\Theta-IPOD). A version based on hard thresholding correctly identifies outliers on some hard test problems. We find that Θ\Theta-IPOD is much faster than iteratively reweighted least squares for large data because each iteration costs at most O(np)O(np) (and sometimes much less) avoiding an O(np2)O(np^2) least squares estimate. We describe the connection between Θ\Theta-IPOD and MM-estimators. Our proposed method has one tuning parameter with which to both identify outliers and estimate regression coefficients. A data-dependent choice can be made based on BIC. The tuned Θ\Theta-IPOD shows outstanding performance in identifying outliers in various situations in comparison to other existing approaches. This methodology extends to high-dimensional modeling with pnp\gg n, if both the coefficient vector and the outlier pattern are sparse

    Predicting deadline transgressions using event logs

    Get PDF
    Effective risk management is crucial for any organisation. One of its key steps is risk identification, but few tools exist to support this process. Here we present a method for the automatic discovery of a particular type of process-related risk, the danger of deadline transgressions or overruns, based on the analysis of event logs. We define a set of time-related process risk indicators, i.e., patterns observable in event logs that highlight the likelihood of an overrun, and then show how instances of these patterns can be identified automatically using statistical principles. To demonstrate its feasibility, the approach has been implemented as a plug-in module to the process mining framework ProM and tested using an event log from a Dutch financial institution

    Exploring Outliers in Crowdsourced Ranking for QoE

    Full text link
    Outlier detection is a crucial part of robust evaluation for crowdsourceable assessment of Quality of Experience (QoE) and has attracted much attention in recent years. In this paper, we propose some simple and fast algorithms for outlier detection and robust QoE evaluation based on the nonconvex optimization principle. Several iterative procedures are designed with or without knowing the number of outliers in samples. Theoretical analysis is given to show that such procedures can reach statistically good estimates under mild conditions. Finally, experimental results with simulated and real-world crowdsourcing datasets show that the proposed algorithms could produce similar performance to Huber-LASSO approach in robust ranking, yet with nearly 8 or 90 times speed-up, without or with a prior knowledge on the sparsity size of outliers, respectively. Therefore the proposed methodology provides us a set of helpful tools for robust QoE evaluation with crowdsourcing data.Comment: accepted by ACM Multimedia 2017 (Oral presentation). arXiv admin note: text overlap with arXiv:1407.763

    A COMPARISON OF METHODS FOR SELECTING PREFERRED SOLUTIONS IN MULTIOBJECTIVE DECISION MAKING

    No full text
    ISBN : 978-94-91216-77-0In multiobjective optimization problems, the identified Pareto Frontiers and Sets often contain too many solutions, which make it difficult for the decision maker to select a preferred alternative. To facilitate the selection task, decision making support tools can be used in different instances of the multiobjective optimization search to introduce preferences on the objectives or to give a condensed representation of the solutions on the Pareto Frontier, so as to offer to the decision maker a manageable picture of the solution alternatives. This paper presents a comparison of some a priori and a posteriori decision making support methods, aimed at aiding the decision maker in the selection of the preferred solutions. The considered methods are compared with respect to their application to a case study concerning the optimization of the test intervals of the components of a safety system of a nuclear power plant. The engine for the multiobjective optimization search is based on genetic algorithms

    Robust high-dimensional precision matrix estimation

    Full text link
    The dependency structure of multivariate data can be analyzed using the covariance matrix Σ\Sigma. In many fields the precision matrix Σ1\Sigma^{-1} is even more informative. As the sample covariance estimator is singular in high-dimensions, it cannot be used to obtain a precision matrix estimator. A popular high-dimensional estimator is the graphical lasso, but it lacks robustness. We consider the high-dimensional independent contamination model. Here, even a small percentage of contaminated cells in the data matrix may lead to a high percentage of contaminated rows. Downweighting entire observations, which is done by traditional robust procedures, would then results in a loss of information. In this paper, we formally prove that replacing the sample covariance matrix in the graphical lasso with an elementwise robust covariance matrix leads to an elementwise robust, sparse precision matrix estimator computable in high-dimensions. Examples of such elementwise robust covariance estimators are given. The final precision matrix estimator is positive definite, has a high breakdown point under elementwise contamination and can be computed fast

    Gauge fields, ripples and wrinkles in graphene layers

    Full text link
    We analyze elastic deformations of graphene sheets which lead to effective gauge fields acting on the charge carriers. Corrugations in the substrate induce stresses, which, in turn, can give rise to mechanical instabilities and the formation of wrinkles. Similar effects may take place in suspended graphene samples under tension.Comment: contribution to the special issue of Solid State Communications on graphen
    corecore